S3 bucket

Back to DuckDB Data Engineering Glossary

An S3 bucket is a fundamental storage container in Amazon Web Services' Simple Storage Service (S3). It functions as a cloud-based folder for storing and organizing data objects, such as files, images, and documents. S3 buckets are globally unique, scalable, and designed to provide high durability and availability for data storage. They support various access control mechanisms and can be configured for different storage classes based on data access patterns and cost considerations. Data engineers often use S3 buckets as a central repository for raw data, processed datasets, or as part of data lakes. When working with DuckDB, you can directly query data stored in S3 buckets using syntax like:

Copy code
SELECT * FROM read_parquet('s3://your-bucket-name/path/to/file.parquet');

This seamless integration allows for efficient data processing without the need to download files locally.